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GEDANKEN is an experimental programming language wifh the 
following characteristics. (1) Any value which is permitted in 
some context of the language is permissible in any other mean- 
ingful context. In particular, functions and labels are permissible 
results of functions and values of variables. (2) Assignment and 
indirect addressing are formalized by introducing values, 
called references, which in tum possess other values. The as- 
signment Operation always affects the relation berween some 
reference and its value. (3) All Compound data structures are 
treated as functions. (4) Type declarations are not permitted. 

The functional approach to data structures and the use of 
references insure that any process which accepts some data 
structure will accept any logically equivalent structure, regard- 
less of its internal representation. More generally, any data 
structure may be implicit; i.e. it may be specifled by giving an 
arbitrary algorithm for Computing or accessing its components. 
The existence of label variables permits the construction of co- 
routines, quasi-parallel processes, and other unorthodox control 
mechanisms. 

A variety of programming examples illustrates the generality 
of the language. Limitations and possible extensions are dis- 
cussed briefly. 
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Introduction 

The recent development of programming languages 
suggests that the simultaneous achievement of simplicity 
and generality in language design is a serious unsolved 
problem. This paper describes an experimental language, 
called GEDANKEN, which was developed to attack this 
problem. 

GEDANKEN is not intended to be a generally useful 
language, althoughit couldbeeffective insituations wherea 
fair degree of object program inefficiency is tolerable. Its 
major purpose (reflected in its name, which is meant as an 
analogy to gedankenexperiments in physics) is to explore 
the consequences of two basic design principles: 

(1) Completeness. Any value which is permitted in some 
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context of the language is permissible in any other mean- 
ingful context. In particular, functions and labels are per- 
mitted to be results of functions or values of references 
(e.g. variables), without imposing restrictions which main- 
tain a stack discipline for run-time storage allocation. 

(2) The Reference Concept. Assignment and indirect 
addressing are formalized in the following manner: among 
the possible values which may occur in a program are 
objects called references, which in turn possess other values. 
The assignment Operation always affects the relation be- 
tween some reference and its value. 

Neither of these principles is novel. Lisp [la and lb] 
(in its interpretive implementations), Iswim [2], and Pal 
[3] all satisfy the principle of completeness, and the refer- 
ence concept is used in Algol 68 [4] and Basel [5]. But 
Gedanken goes beyond these languages in exploiting the 
power of these principles, i.e. in ehminating other language 
features which are rendered redundant by completeness 
and references. Specifically: 

(1) The existence of function-returning and reference- 
returning functions allows all Compound data structures to 
be treated as functions. For example, a one-dimensional 
ALGOL-like array is treated as a function whose domain is a 
finite set of consecutive integere and which maps each of 
these integere into a unique reference. This approach in- 
sures that any process which accepts some data structure 
will accept any logically equivalent structure, regardless of 
its internal representation. More generally, any data struc- 
ture may be implicit; i.e. it may be specified by giving an 
arbitrary algorithm for Computing or accessing its compo- 
nents. (Functional data structures have been suggested by 
Balzer [6], but bis realization of the concept is quite differ- 
ent than GEDANKEN.) 

(2) The existence of label variables permits the construc- 
tion of coroutines, quasi-parallel processes, and other un- 
orthodox control mechanisms. This is a direct consequence 
of not imposing a stack discipline on the program control 
information. 

The main limitation of GEDANKEN is that declara- 
tions are not allowed to restrict the value ranges of identi- 
fiers, references, or function results. Languages with this 
property are usually called "typeless," although the types 
of values may be tested during execution. We do not sug- 
gest that type declarations are unimportant or that it is 
trivial to add them to GEDANKEN without destroying 
the generality of the language; this is a major theoretical 
problem. 

The originality of GEDANKEN lies primarily in the 
language features which have been excluded, and the main 
aim of this paper is to demonstrate that these exclusions 
(except typelessness) do not impair generality. For this 
purpose, we include extensive programming examples. 

A formal definition of GEDANKEN is given in [7]. A 
complete but extremely inefficient implementation has 
been produced by translating this formal definition into 
Lisp; this implementation has been used to check all 
examples given in this paper. 

Volume 13 / Number 5 / May, 1970 



Copyright © 1970, Association for Computing Machinery, Inc. 



After describing the syntax of the language and the types 
of values which are manipulated during program execution, 
wc discuss the applicative part of the language, i.e. the 
evaluation of expressions and the application of functions. 
Finally, the imperative aspects, such as references, assign- 
ment, labels, and jumps, will be introduced. 

Syntax 

Although the importance of GEDANKEN lies in its 
semantics, a definite syntax must be specified so that pro- 
grammingexamplescanbegiven. A GEDANKEN program 
is a sequence of tokens separated by zero or more blanks, 
with at least one blank used as a separater whenever the 
juxtaposition would otherwise be ambiguous. The tokens 
are sequences of characters classified as f ollows : 

constants digit strings (denoting integers), quoted strings 
reserved words AND, OR, IF, THEN, ELSE, CASE, OF, 
IS, ISR 

identifiers all other alphanumeric strings beginning with 
a letter 

punctuation tokens X , = :();: = 

Certain predefined identifiers have Standard meanings. 
These include: TRUE, FALSE, LL, and UL, which denote 
specific primitive values; ERROR, which denotes a built-in 
label value causing program termination; and the names of 
all built-in functions. (These predefined identifiers differ 
from reserved words in that the programmer can override 
the Standard meanings by declarations.) 

The set of token sequences which are well-formed 
GEDANKEN programs is specified by the context-free 
grammar (over an infinite vocabulary of tokens) in Table 
I. The syntactic variables in this grammar are subscripted 
to distinguish among phrases with a similar semantic role 
but different levels of precedence. Thus phrases of the 
classes (exp 0 ), • • • , (exp 6 ) are all called expressions, while 
phrases of the classes (pformo) and (pformj) are called 
Parameter forms. The notation {a}* is used to indicate an 
iirbitrary number (including zero) of occurrences of the 
string a. 

It should be noted that a block can consist of a single 
expression; this permits any expression to be parenthesized 
without clianging its semantics. 

Primitive Values and Functions 

Tin» item« of data which are manipulated during the 
üxwutioh <>f a GEDANKEN program are called values. 
The m<t. of all values is partitioned into seven types: inte- 
f/r/x, Iftiolcitns, characters, and atoms (collectively called 
primitive vahics), and functions, references, and labet values 
(collectively called nonprimitive values). (Floating-point 
miiiihcr» uro exchided, but their inclusion would not raise 
any Nigniliauit problems.) Although the language does not 
cont» in type declarations, a complete set of built-in func- 
fion.s is available for testing the type of a value during 
program execution. 

Among the primitive values, only atoms are unusual; 



TABLE I. A Grammar for GEDANKEN 

(expo) ::= (constant) | (identifier) | ((block» 
(expi) ::= (expo) | (funetion designator) 
(funetion designator) ::= (expo) (expi) 
(expi) ::= <expi) | (expi) = (exp 4 ) 
(expt) ::= <exp s ) | (exp 2 ) AND (exp 8 ) 
(exp«) ::= (exp 8 ) j (exp 8 ) OR (exp,) 

(exp t ) ::= (exp 4 ) | (conditional exp) | (lambdaexp) | (exp 4 > := (exp s ) 

(conditional exp) ::= IF (exp 6 ) THEN (exp 6 ) ELSE (exps) 

(lambda exp) ::= X (pformo) (exp 6 ) 

<exp 6 ) ::= (exp 5 ) | (sequence exp) | (case exp) 

(sequence exp) ::= (empty) | (exp 5 ), (exp 6 ) (, (exp t )}* 

(case exp) ::= CASE (exp 6 ) OF (exp 5 ) (, (exp 6 )|* 

(pformo) ::= (identifier) | ((pformi)) 

(pformi) ::= (pformo) | (sequence pform) 

(sequence pform) ::= (empty) | (pformo), (pformo) |, (pformo)!* 
(deel) ::= (pformi) IS (exp t ) 

<recursive deel) ::= (identifier) ISR {lambda exp) 
(label) ::= (identifier) : 
(statement) ::= ((label)}* (exp 6 ) 

(block) :: = |(decl);)* {(recursive deel);)* {(statement);)* 

(statement) 
(program) ::= (block) 



they are similar to atoms in Lisp, except that they lack 
property lists and print names. More precisely, the atoms 
are a denumerably infinite set of values which may be 
tested for equality, but which do not possess any orde'ring 
or arithmetic Operations. Two particular atoms, denoted 
by the predefined identifiers LL and UL, play a special role 
in the language. Additional atoms are created by the 
built-in funetion ATOM, which returns a distinet atom 
each time it is applied. 

A funetion is a value which may be applied to another 
value called its argument. When so applied, the funetion 
will either: (i) return a value called its result, (ii) transfer 
control to a label value without returning a result, (iii) 
cause an error stop, or (iv) initiate a nonterminating com- 
putation. (The application of a funetion may also alter the 
state of a computation by producing various side effects, 
which will be discussed later.) The set of arguments for 
which a funetion will retum a result is called the domain 
of the funetion. A number of built-in functions are pro- 
vided which may be used without being defined; additional 
"user-defined" functions are produced by the evaluation of 
various expressions. 

(Proper procedures, in the sense of Algol, are not pro- 
vided in GEDANKEN, since they are equivalent to func- 
tions which execute useful side effects but return an irrele- 
vant result. Functions with multiple arguments are not 
provided, since they are equivalent to functions whose 
arguments are sequences, as described below.) 

The functional approach to data struetures is reflected in 
the absense of a distinet type of value corresponding to the 
conventional notion of a vector or array; the analogous 
values in GEDANKEN are functions. Thus we will use 
the word "vector" to denote those functions which are 
logically equivalent to conventional vectors. 
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It is evident that the domain of a GEDANKEN function 
which is a vector must include a finite set of consecutive 
integers; these integers are the analogue of the subscripts of 
a conventional vector. But a Conventions! vector also has 
the property that its set of subscripts is explicit; i.e. there 
must be some method of testing the vector to determine its 
least and greatest subscripts. To reflect this property in 
GEDANKEN, we require that the domain of a vector 
must include, in addition to the subscript set, the atoms 
LL and UL, and that the results of applying the vector to 
LL and UL must be the least and greatest subscripts. 

This leads to the following definition. A function F is 
called a vector whenever: (1) its domain includes the atoms 
LL and UL; (2) the results of applying F to LL and UL 
are integers such that F(UL) > F(LL) — 1; (3) the domain 
of F includes all integers i such that F(LL) < i < F(UL). 

If F is a vector, then the integers F(LL), F(UL), and 
F(UL) — F(LL) + 1 are called the lower limit, upper limit, 
and length of F, respectively, and for each integer i such 
that F(LL) < i < F(UL), the result of applying F to i 
is called the ith component of F. 

A vector is called a sequence if its lower limit is 1. 

Although a vector is a kind of function, and a sequence 
is a kind of vector, neither "vector" nor "sequence" is a 
"type" in the usual sense, since one cannot write a program 
which will test whether an arbitrary function is a vector or 
a sequence. Certain Operations in GEDANKEN (e.g. 
evaluation of sequence expressions or application of the 
built-in function VECTOR) are guaranteed to produce 
vectors, but equally valid vectors may also be produced by 
more general mechanisms (e.g. evaluation of lambda ex- 
pressions). Vectors produced in the latter manner are said 
to be implicit. 

(The realization of vectors in GEDANKEN is in con- 
trast to several languages, such as Pal, in which subscript 
limits are obtained by applying built-in functions to vec- 
tors. In the latter approach vectors are not purely func- 
tional, since they are amenable to other Operations than 
application. The practical effect is to prohibit implicit 
vectors.) 

The existence of sequences in GEDANKEN justifies the 
elimination of functions with multiple arguments. The 
analogue of a conventional function with k arguments, 
when either k = 0 or k > 2, is a function whose single 
argument is a sequence of length k. For example, the do- 
main of the built-in function ADD is the set of sequences 
of length two whose components are both integers. (This 
approach is a direct borrowing from Pal/) 

The remaining types of nonprimitive values, references 
and label values, will be defined later. 

Applicative Semantics 

To describe the semantics of GEDANKEN, we follow 
Landin [2] and Evans [3] in dividing the language into an 
applicative part, involving the evaluation of expressions 
and the application of functions, and an imperative part, 
involving assignment and control jumps. We first consider 



the applicative sublanguage, which is obtained by dis- 
regarding references, label values, and the Operations which 
manipulate them. 

Within this sublanguage, the basic Operation is the 
evaluation of expressions. Since the evaluation of an ex- 
pression will usually involve the evaluation of its sub- 
expressions, the definition of this Operation is inherently 
recursive. Also, when an expressioncontains freeidentifiers, 
its evaluation is only meaningful in the presence of some 
mapping of these identifiers into values; such a mapping is 
called an environment and is said to bind each identifier to a 
value. 

A complete program is aJways evaluated in an environ- 
ment which binds the predefined identifiers into their 
Standard values. Whenever the evaluation of an expression 
e involves the evaluation of an immediate subexpression 
e', then, unless e is a lambda expression or a block, e' is 
evaluated in the same environment as e. The evaluation of 
lambda expressions and blocks (described in detail below) 
involves the concept of extension: if i is an identifier, v is a 
value, and rj and j?' are environments such that ij' binds t 
to v and specifies the same binding as y for all other identi- 
fiers, then V is called the extension of v formed by binding 
i to v. 

We now describe the evaluation of each nontrivial form 
of expression. The application of a function to an argument 
is perf ormed by a function designator : 

(function designator) ::= (exp 0 > (expi) 

function argument 
part part 

which is evaluated by first evaluating its function part and 
its argument part to obtain values V; (which must be a 
function) and t>„ , and then applying v f to v a . (Since the 
argument part is evaluated before the function is applied, 
this form of evaluation is similar to call by value in Algol, 
rather than call by name.) Since function designators have 
a right-associative syntax, the usual composition of func- 
tions may be written without parentheses; e.g. F(G(X)) 
may be written as F G X. 

Functions may be produced by the evaluation of lambda 
expressions : 

(lambda exp) ::= \ (pformo) (expj) 

body 

Basically, the value of a lambda expression is a function 
which (when it is applied to an argument at some later 
point during the computation) computes its result by 
binding the parameter form to its argument and then eval- 
uating the body. More precisely, if / is the function ob- 
tained by evaluating X(p)e in the environment ij, then the 
result of applying / to an argument a will be obtained by 
evaluating e in an environment which is the extension of v 
formed by binding p to a. (The meaning of binding p to a, 
when p is not an identifier, will be defined below.) 

This binding mechanism is quite conventional (it is 
called FUNARG binding in Lisp and is similar to the 
mechanism used in Algol and in PL/I), but a clear under- 
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standitig of its implications is vital. There are two separate 
actions: (i) the evaluation of the lambda expression to 
produce a function, and (ii) the application of this function 
to its arguments. The body of the lambda expression is not 
evaluated until (ii), but the environment in which the body 
is evaluated is an extension of the environment used during 
(i) rather than (ii). As a result, when a lambda expression 
contains free identifiers, its evaluation in different environ- 
ments will produce different functions. For example, in an 
environment where Y is bound to an integer k, the evalua- 
tion of X(X) ADD(X, Y) produces a function which in- 
creases its argument by k. 

Functions which are sequences may also be produced by 
the evaluation of sequence expressions : 

(sequence exp) ::= (empty) | (exp 6 ), (exp s ) |, (exp 6 ))* 

Let n be the number of subexpressions. Then the sequence 
expression is evaluated by first evaluating its subexpres- 
sions to obtain values »i ,•••,«„ and then producing a 
sequence of length n whose ith component (for 1 < i < n) 
is Vi . 

Because of their low precedence, sequence expressions 
are usually parenthesized, but the parentheses themselves 
do not indicate a sequence expression. Thus the expressions 
( ) and (X, Y) both produce sequences, but (X) has the 
same value as X. There is no sequence expression which 
produces a sequence of length one, but such sequences can 
be produced by the built-in function UNITSEQ, which 
returns a sequence whose only component is the value of its 
argument. 

As noted earlier, a function of n arguments (n 1) is 
treated in GEDANKEN as a function of a sequence of 
length n. This suggests that when a function produced by a 
lambda expression expects to receive a sequence as its 
argument, the parameter form within the lambda expres- 
sion should be able to bind several different identifiers to 
the components of the sequence. To provide this capability 
we extend the notion of a parameter form to include a 
sequence parameter form (which is a rough analogue of a 
formal parameter list in Algol) : 

(sequence pform) ::= (empty) | (pformo), <pform 0 ) {, (pformo) I* 

The relevant semantics are given by defining (recur- 
sively) the extension of an environment y formed by bind- 
ing an arbitrary parameter form p to a value v. This 
extension is computed as follows: 

(1) If p is an identifier, then y is extended by binding 
p to v. 

(2) If p has the form (ja'), then 17 is extended by binding' 
7/ to v. 

(3) lf p is a sequence parameter form, pi , . . . , p n 
(n 1), then v, which must be a function, is applied 
to euch integer from 1 to n, and i\ is repeatedly 
extended by binding each pi to the result of t>(i). 

The syntax of sequence expressions and sequence param- 
eter forms preserves conventional notation for functions 
of several arguments. Thus in the evaluation of (X(X, Y) 



body) (3, 4), X is bound to 3 and Y is bound to 4. However, 
the sequence argument approach also provides useful 
unconventional capabilities, e.g. (X(X, Y) body) (IF P 
THEN (3, 4) ELSE (5, 6)). More importantly, the ability 
to bind a Single identifier to an entire sequence provides 
the equivalent of a function with an indefinite number of 
arguments, e.g. (XX body) (IF P THEN (3, 4) ELSE 
(5, 6, 7)). , 

GEDANKEN is similar to Euler [8] in treating all 
types of unlabeled statements as expressions. In particular, 
a block is a form of expression with a meaningful value: 

(block) = ((decl);)* {(recursive decl);!* 

{(statement);}* (statement) 

where 

(decl) ::= (pformi) IS (exp 6 ) 

(recursive decl) ::= (identifier) ISR (lambda exp) 

Basically, a block is evaluated by first carrying out the 
bindings indicated by its declarations, recursive declara- 
tions, and labels, and then evaluating the statements in 
order from left to right. The value of the block is the value 
of the rightmost statement. The values of preceding state- 
ments are ignored; in the absence of imperative features, 
these statements have no effect. 

More precisely, a block is evaluated as follows (we in- 
clude the binding of labels although it is an imperative 
aspect of the language): 

(1) For each declaration «decl», in order from left to 
right: the right side of the declaration is evaluated, and 
then the current environment is extended by binding the 
left side of the declaration to the value of the right side. 

(2) The current environment is further extended by 
binding each identifier which occurs on the left of a recur- 
sive declaration ((recursive decl», or as the label of a 
statement, to a distinct "dummy" value. 

(3) The right side of each recursive declaration is 
evaluated, and its value replaces the corresponding dummy 
value. 

(4) For each label, an apropriate label value is created 
and replaces the corresponding dummy value. 

(5) The statements are evaluated in order from left to 
right. 

(6) The value of the block is the value of the rightmost 
statement. 

In steps 2 to 4, the device of binding identifiers to 
dummy values and then replacing the dummy values 
allows an environment to be cyclic, i.e. to bind an identifier 
to a value which is produced by evaluating a lambda 
expression (or label) in the same environment. 

The essential difference between (nonrecursive) declara- 
tions and recursive declarations is that the right side of a 
declaration "feels" only the bindings caused by preceding 
declarations, while the right side of a recursive declaration 
feels the bindings caused by all declarations in the block, 
including implicit label declarations. Recursive declara- 
tions are needed to define recursive functions conveniently, 
including families of functions which call one another. 
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(They also permit the definition of functions which jump 
into the immediately enclosing block.) 

Nonrecursive declarations are less essential, but they 
permit convenient constructions such as X IS ADD(X, 
1). More important, their existence allows the right sides 
of recursive declarations to be limited to lambda expres- 
sions, so that meaningless constructions such as X ISR 
ADD(X, 1) are syntactically illegal. 

Condüional expressions have the same meaning as in 
Algol. Case expressions have a rather unorthodox mean- 
ing (which is convenient for defining implicit sequences) : 
CASE eo OF ei , . . . , e„ is evaluated by first evaluating eo 
to obtain a value i; then if i is an integer satisfying 1 < * 
< n, the value of the case expression is obtained by 
evaluating e,- ; if i is LL or UL the value is 1 or n respec- 
tively; all other values of i give an error stop. 

The remaining forms of expressions are most easily 
defined as abbreviations. Except for coercion (discussed 
later), they can be eliminated from a program by applying 
the following transformations: 

ei = e s =* EQUAL(ei , e 2 ) 
ei AND e, => (IF e, THEN e, ELSE FALSE) 
ei OR e, => (IF ei THEN TRUE ELSE es) 
ei := es => SET(ei , e 2 ) 

The built-in function SET will be defined later. EQUAL 
tests the equality of primitive data, but if either com- 
ponent of its argument is a function or a label value, it 
will return FALSE. Its action on references will be de- 
scribed later. 

Theoretically, nonrecursive declarations, sequence Pa- 
rameter forms, and sequence expressions can also be 
regarded as abbreviations. Their occurrences in a program 
can be eliminated by repeated application of the following 
equivalences: 

p ISe; b=* (X(p)(b))(e) 
X(pi , • • • , P») b (when 1) 

=*Xi(pi ISilj ••• ; p„ISin'; 6) 
ei , ■■• , e„ (when n iA 1) 

=4- (ii IS ei ; • • • ; i„ IS e„ ; Xi(CASE i OF ii , ■ • ■ , in)) 

where n' is an integer constant whose value is n, and i, i\ , 
. . . , i n are distinct identifiers which do not occur in the 
program being transformed. 

It should be noted that GEDANKEN does not include 
certain features, such as infix arithmetic Operators or for 
Statements, which would enhance the conciseness of the 
language without expanding the ränge of programs which 
could be expressed. Such features could be added easily, 
but they are not germane to the basic purposes of the 
language. 

Functional Data Structures 

Even the applicative part of GEDANKEN is sufficient 
to demonstrate the power and flexibility which can be 
obtained by treating data structures functionally. 

As a first example, consider Lisp-like list structures. To 
define analogues of the Lisp functions CONS, CAR, and 
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CDR, we treat the two-field list cell produced by CONS as 
a function whose domain contains two elements (e.g. 1 
and 2) and which maps these elements into the values of its 
CAR and CDR fields. This viewpoint leads directly to the 
definitions: 

CONS IS X(X, Y) XZ IF Z = 1 THEN X ELSE Y; 
CAR IS XX XI; 
CDR IS XX X 2; 

These definitions imply an ability to do list processing 
without special built-in functions. In a conventional list- 
processing System (e.g. compiled Lisp 1.5 [la and lb] or 
some extensions of Algol [4, 9]) user-defined functions are 
restricted so that storage for the values of their identifiers 
obeys a stack discipline. Then list structures, which do not 
obey a stack discipline, must be allocated in a separate 
storage area, and built-in functions or Operations must be 
provided for accessing this area. But in GEDANKEN, the 
user may develop list-processing by defining function- 
returning functions (such as CONS above) which violate 
a stack discipline. In effect, all storage is potentially list- 
structured. 

Although the above approach is workable and theoreti- 
cally attractive, it is more convenient to use sequence 
expressions to create list elements and direct application 
to obtain their subfields. Thus, we write (X, Y) instead of 
CONS (X, Y), X 1 instead of CAR X, and X 2 instead of 
CDR X. Following this approach, we introduce lists by 
first creating an atom to denote the empty list: 

NIL IS ATOM( ); 

and then defining a list to be either the atom NIL or a 
sequence of length two whose second component is a list. 
The following functions will retum the length of a list, 
find the ith element of a list, and append one list to 
another: 

LISTLENGTH ISR XL IF L = NIL THEN 0 

ELSE INC LISTLENGTH L 2; 
LISTELEM ISR X(I, L) IF L = NIL THEN GOTO ERROR 

ELSE IF I = 1 THEN L 1 ELSE LISTELEM (DEC I, L 2); 
APPEND ISR X(X, Y) IF X = NIL THEN Y 

ELSE (X 1, APPEND (X 2, Y)); 

Hence INC and DEC are built-in functions which increase 
or decrease an integer by one. 

As a second example, consider one-dimensional arrays. 
We have defined a type of function called a vector which is 
the analogue of a one-dimensional array, and we have 
introduced sequence expressions for creating vectors. But 
a sequence expression can only produce a vector which is a 
sequence, and it is inconvenient for producing very long 
vectors. What is needed is a function which will produce a 
vector from a functional specification of its components, 
i.e. which will accept another function, tabulate its results 
over a finite ränge, and return a "lookup" function for the 
resulting table. 

Thus we define a function VECTOR which accepts an 
argument (L, TJ, F), where L and U are integers and F is a 
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f unction. If U < L, VECTOR returns an empty vector V 
such that V(LL) = L and V(ÜL) = L - 1. Otherwise, 
VECTOR evaluates F(I) for each integer I between L and 
U inclusive, and returns a vector V such that V(LL) = L, 
V(UL) = U and for L < I < U, V(I) is the value of F(T). 
The basic approach is to recur on the length of the vector, 
tabulating a single value (bound to T) at each level of 
recursion. 

VECTOR ISR X(L, U, F) 
IF GREATER(L, U) THEN 

X I IF I = LL THEN L ELSE IF I = UL THEN DEC L 
ELSE GOTO ERROR 
ELSE (V IS VECTOR(L, DEC U, F); T IS F U; 

X I IF I = UL THEN U ELSE IF I = U THEN T ELSE V I); 

It is evident that this function, although theoretically 
correct, will be extremely inefficient in any reasonable 
implementation. For this reason, a built-in function 
VECTOR is provided which is defined to be equivalent to 
the function above (except for coercion). 

(This question of efficiency may be clarified by consider- 
ing implementation mechanisms. In a simple implementa- 
tion, functions would possess two distinct internal repre- 
sentations: If a function was produced by evaluating a 
lambda expression, it would be represented by a "lambda 
record" containing a pointer to code which was compiled 
from the lambda expression plus values for each free 
identifier in the lambda expression (i.e. a representation of 
the environment in which the lambda expression was 
evaluated) . On the other hand, if a function was created by 
evaluating a sequence expression or by the application of 
VECTOR, it would be represented by a "vector record" 
containing domain limit and indexing Information plus a 
contiguous array of component values. It is evident that 
the above definition of VECTOR would yield a vector 
whose internal representation was a linked list of lambda 
records, each containing one component value, rather than 
a contiguous array.) 

Using lists and vectors, we may illustrate our assertion 
that any process which accepts some data structure will 
accept any logically equivalent structure. Suppose that P 
is a function which expects a sequence as its argument, and 
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that we wish to give it a sequence whose ith. component is 
the ith dement of a list L. This can be done in a conven- 
tional manner by evaluating P VECTOR (1, LIST- 
LENGTH L, X I LISTELEM(I, L)), which copies the 
elements of L into a contiguous array. But it is also possible 
to evaluate P MAKESEQFROMLIST L, where 

MAKESEQFROMLIST IS X L 

X I IF I = LL THEN 1 ELSE IF I = UL THEN LISTLENGTH L 
ELSE LISTELEM(I, L); 

MAKESEQFROMLIST does not copy the components of 
L; instead, it returns an implicit sequence which will look 
up the appropriate element of L each time one of its com- 
ponents is accessed. 

It is equally possible to produce an implicit list from a 
sequence: 

MAKELISTFROMSEQ ISR X S MLFS1(1, S); 
MLFS1 ISR X(I, S) IF GREATER (I, S UL) THEN NIL 
ELSE X K (CASE K OF S I, MLFS1(INC I, S)); 

(Here MLFS1 is a subsidiary function which produces an 
implicit list from the subsequence of S that begins with 
the Ith component.) 

The data structures shown so far have the Umitation that 
once a structure has been created, its components or ele- 
ments cannot be altered. To overcome this limitation we 
must introduce the imperative aspects of GEDANKEN. 

References 

In any programming language which permits assign- 
ment, there is a class of objects which are affected by 
assignment. We will call these objects references; other 
terms used commonly in the literature are "name" and 
"L- value." At any time during the execution of a program, 
each reference possesses some value. The eff ect of an assign- 
ment Operation r : = v is to cause the reference denoted by 
r to possess the value denoted by v. 

Within this definitional framework, there are at least 
three distinct approaches to assignment (see Figure 1) : 

(1) Identifiers are used as references. This approach is 
used in Snobol [10], where a form of indirect addressing 
is achieved by allowing identifiers to occur as values. Un- 
fortunately, the approach does not mesh well with block 
structure; a discussion of the difficulties is given by Kain 

(2) References are distinct from either identifiers or 
values, and are interposed between all other value-denoting 
entities and their values. Thus the Dindings of identifiers, 
the arguments and results of functions, and the com- 
ponents of vectors are all references, and the values 
denoted by these entities are actually the values possessed 
by the references. This approach is used in Pal, and to a 
large extent in Foktran and PL/I, except that in the latter 
languages function results are values, and identifiers may 
be bound directly to functions and label values, but not to 
primitive values. The approach meshes well with block 
structure but is rather inflexible; one moves from the 
applicative Situation, where assignment is impossible, to 
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the opposite extreme, where every value-denoting entity 
can be affected by assignment. 

(3) References are treated as a distinct type of value, so 
that any value-denoting entity can denote either a con- 
ventional value or a reference which in turn possesses a 
value. This approach is used in Algol 68 and Basel. 
(Basel also permits a form of assignment which alters 
identifier Dinding.) It is compatible with block structure 
and is more flexible than the previous approach, since the 
programmer can introduce references in just those con- 
texts where he intends to do assignment. Advantages 
should accrue in both the optimization of data representa- 
tions and the checking of erroneous assignment statements. 

(The above categorization must be qualified by the fact 
that Fohtran, PL/I, Algol 68, and Basel all have type- 
declaration mechanisms which affect their treatment of 
assignment. A discussion of this interaction is beyond the 
scope of this paper.) 

In GEDANKEN we have chosen to use the third ap- 
proach to assignment. Thus we introduce a new, denum- 
erably infinite set of values called references, and stipulate 
that each reference possesses some other value (which may 
itself be a reference). Three built-in functions are provided 
to manipulate referencos: REF, SET, and VAL. REF X 
returns a distinct reference each time it is apphed; this 
reference is initialized to possess the value X. SET(R, X) 
(which can be abbreviated R : = X) causes R (which must 
be a reference) to possess the value X, and also returns X; 
its action on R is an example of a side effect. VAL R returns 
the value possessed by R (which must be a reference). 

For example, under the scope of the declaration X IS 3, 
the identifier X is bound to the integer 3, and this binding 
cannqt be altered by assignment. Evaluation of the expres- 
sion X : = 4 would give an error, since 3 is not a reference. 
Analogously, under the scope of the declaration X IS 
REF 3, the identifier X is bound to the reference created 
by REF, and this binding cannot be changed by assign- 
ment. But now evaluation of X := 4 is legitimate, and 
causes the value possessed by the reference bound to X to 
change from 3 to 4. Thus in the execution of the block 

(X IS REF 3; VAL X = 3; X := 4; VAL X = 4) 

both equality predicates will be true. 

The major difficulty with this approach is the frequent 
necessity for using the function VAL. For example, under 
the scope of the declarations X IS REF 3; Y IS REF 4; 
one would write ADD (VAL X, VAL Y) rather than 
ADD(X, Y), since ADD acts upon integers rather than 
references. To alleviate this difficulty, we introduce 
coercion Conventions into GEDANKEN; i.e. we stipulate 
that references will be replaced by their values in certain 
contexts which would otherwise be meaningless. 

Specifically, let COERCE be the function 

COERCE ISR X X IF ISREF X THEN COERCE VAL X ELSE X; 

(which is available as a built-in function), and define "to 
coerce X" to mean the replacement of X by COERCE X. 
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Then: 

(1) All built-in functions which would otherwise be 
meaningless coerce their argument or the appropriate 
components of their arguments. For example, ADD(X, Y) 
is equivalent to ADD (COERCE X, COERCE Y), but 
ISREF X is not equivalent to ISREF COERCE X, nor 
VAL X to VAL COERCE X. 

(2) REF X coerces X, SET(R, X) (and therefore R 
:= X) coerces X, and EQUAL(X, Y) (and therefore X = 
Y) coerces both X and Y. Since these functions would each 
be meaningful for references without coercion, analogous 
noncoercing functions, named NCREF, NCSET, and 
NCEQUAL, are also provided. NCREF and NCSET 
permit references to possess values which are also refer- 
ences. NCEQUAL can be used to determine whether two 
values are the same reference. 

(3) Conditional and case expressions coerce the values 
of their leftmost subexpressions. 

(4) Expressions involving AND and OR coerce the 
values of both their subexpressions. 

(5) A function designator coerces the value of its function 
part. 

(6) When a sequence parameter form pi , . . . , p» is 
bound to a value a, each p< will be bound to (COERCE a) 
(i). 

(7) Vectors which are created by evaluating sequence 
expressions or by application of the buüt-in functions 
VECTOR or UNITSEQ will coerce their argument. 

Despite their ad hoc appearance, most of these coercion 
rules are instances of the general principle that coercion 
should only occur in situations which would otherwise give 
an error termination. The exceptions are rules (2) and 
(4), which are simply concessions to conventional nota- 
tion. 

Data Structures with Embedded References 

The utility of references becomes apparent when refer- 
ence-returning functions are used to embed references 
within data structures, yielding structures which can be 
altered by assignment. 

This approach provides precise control over the ways in 
which data structures can be altered. Thus the GEDAN- 
KEN equivalent of an ALGOL-like one-dimensional array 
is a vector whose components are references, e.g. 

X IS VECTOR(l, 100, X I REF 0); 

Under the scope of this declaration, assignment can be 
made to the components of X, e.g. X(7) := 10, but not to 
X itself. In particular, the subscript limits X LL and X 
UL are fixed by the declaration. 

On the other hand, the equivalent of a string variable is 
provided by a reference whose value is a vector: 

S IS REF VECTOR(l, 100, F); 

Here assignment can be made to S itself (possibly changing 
the subscript limits) but not to its components. 

A second consequence of the reference concept is the 

Volume 13 / Number 5 / May, 1970 



ability to define data structures or sets of data structures 
which share elements, in the sense that assignment to one 
dement will affect another. Considera Square matrix M. 
We could define M as a vector of vectors, i.e. 

M IS VECTOR(l, 10, X I VECTOR(l, 10, X J REF 0)) ; 

but this leads to the inconvenience of referring to an ele- 
ment of M by (M I) J. It is more natural to define M as a 
reference-returning function of pairs of integers: 

M IS (Ml IS VECTOR(l, 10, X I VECTOR(l, 10, X J REF 0)) ; 
X(I, J) (Ml I) J); 

so that an dement is referred to as M(I, J). Now consider 
the additional dedarations: 

MT IS X(I, J) M(J, I) ; MD IS X I M(I, I); 

Here MT and MD denote the transpose and diagonal of M, 
in the sense that assignment to an element of one matrix 
affects the corresponding elements of the others. 

Elements may also be shared within the same data 
structure. For example, 

S IS(S1 IS VECTOR (1, 10, X I VECTOR (1, 1, X J REF 0)) ; 

X(I, J) IF NOT GREATER(J, I) THEN (Sl I) J ELSE (Sl J) I) ; 
defines a Symmetrie matrix in which assignment to S(I, J) 
also alters S(J, I). 

The embedding of references in list structures also pro- 
vides control over the ways in which these structures may 
be altered. An example is the property list, which is a list 
of property-value pairs subject to two Operations: the 
value paired with a given property may be looked up; or 
the value paired with a given property may be changed, 
adding a new pair to the list if the property is not already 
present. It is evident that references must occur in the 
property list at two points: each value must be a reference, 
so that it can be changed; and the entire list must be a 
reference, so that new pairs can be added. 

The following function manipulates such property lists. 
Given a property P and a (reference to a) property list L, 
PROP VAL(P, L) searches L for an occurrence of P. If P is 
found, the reference paired with P is returned. Otherwise, 
a pair consisting of P and a new reference (initialized to 
zero) is added to L, and the new reference is returned. The 
argument P is coerced. 

PROPVAL IS X(P, L) 
(P IS COERCE P; 
SEARCHL ISR X X 
IF X = NIL THEN 

(NEWV IS REF 0; L := ((P, NEWV), VAL L); NEWV) 
ELSE IF (X 1) 1 = P THEN (X 1) 2 ELSE SEARCHL X 2; 
SEARCHL VAL L); 

An application of this function can occur on either side of 
an assignment Operation; on the right side it will act to 
look up a value, on the left side it will act to alter a value. 

A further step can be taken by viewing the property list 
itself as a reference-returning function which aeeepts a 
property and returns a reference to the corresponding 
value. The following function (of no argumenta) returns 
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such funeiional property lists : 

MAKEPROPLIST IS X( ) (L IS REF NIL; X P PROPVAL(P, L)) ; 

Each application of MAKEPROPLIST returns a new 
instance of PROPVAL, with L bound to a private "own 
variable." Since a property can be any primitive value, a 
functional property üst is similar to a reference-valued 
vector, except that it has an indefinite domain. Indeed, 
functional property lists can be used to provide an efficient 
implementation of sparse vectors. 

As a final example of the use of references, suppose that 
READ is a function such that each application of READ 
produces the next item of data from some input stream, 
and that we wish to produce an implicit list of the succes- 
sive items in the stream. The following function (of no 
arguments) returns such a list: 

MAKERLIST ISR X( ) 
(B IS REF 0; X I 
(IF B = O THEN B : = (READ ( ), M AKERL IST ( )) ELSE ( ); 
B D); 

The result of MAKERLIST is an implicit list (whose 
implicit length is infinite) which only applies READ as 
items of data are actually needed, and only stores pre- 
viously read items which are still accessible. 

Implicit References 

The utility of implicit data structures suggests the 
introduetion of an analogous facility for references. Thus 
we introduce the coneept of an implicit reference, i.e. a 
value whose externa! appearance isthe same as a reference, 
but which may carry out an arbitrary computation each 
time it is set or evaluated. (Implicit references are related 
to doublets in Pop-2 [12].) 

To speeify an implicit reference, the programmer must 
provide two funetions: a "setting function" S which will 
be executed each time a value is assigned to the implicit 
reference, and an "evaluating function" V which will be 
executed each time the implicit reference is evaluated. Thus 
an implicit reference is produced by applying the built-in 
function IMPREF(S, V), where S and V may be arbitrary 
funetions of one and zero arguments respectively. Each 
application of IMPREF produces a distinet implicit 
reference, and these implicit references satisfy the predicate 
ISREF and are coerced in the same manner as conven- 
tional references. But the effect of SET or VAL on an 
implidt reference is to execute S or V. Specifically, if R 
is the result of IMPREF(S, V), then 

NCSET(R, X) = (S X; X) 

SET(R, X) = (X IS COERCE X; S X; X) 

VAL R = V ( ) 

To illustrate the use of implicit references, consider the 
problem of protecting a reference-valued vector. Suppose 
that P is a function which aeeepts a vector whose com- 
ponents are references. We wish to apply P to such a 
vector V, but to protect the components of V from being 
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affected by P; i.e. we want these components to revert to 
their original values after the application of P is finished. 
The simplest approach is to copy V by executing P 
VECTOR(V LL, V UL, X I REF V I), but this will be 
inefficient if V is large and only a few components are reset 
by P. An alternative approach is to maintain a "change 
list" of the components of V which have been altered by 
P. This may be done by executing P PSEUDOCOPY V, 
where 

PSEUDOCOPY IS X V 
(CL IS REF NIL; 

SEARCHCL ISRX(X, I, F, G) IF X = NIL THEN G( ) 
ELSE IF (X 1) 1 = I THEN F (X 1) 2 
ELSE SEARCHCL(X 2, I, F, G); 
X I (I IS COERCE I; 

IF I = LL THEN V LL ELSE IF I = UL THEN V UL 
ELSE IF NOT ISINTEGER I OR GREATER(V LL, I) 
OR GREATER(I, V UL) 
THEN GOTO ERROR 
ELSE IMPREF( 
X X SEARCHCL (VAL CL, I, X R NCSET(R, X), 

X( ) (CL := ((I, NCREF X), VAL CL)), 
X( ) SEARCHCL (VAL CL, I, VAL, X( ) VAL V I)))); 

The result of PSEUDOCOPY is an implicit vector 
whose components are implicit references. Internally, CL 
is a reference to the change list, which is a list of pairs, each 
containing an integer argument of some altered com- 
ponent and a reference to the current value of that com- 
ponent. SEARCHCL is a subsidiary function which 
searches a change list X for a pair beginning with the 
integer I. If such a pair is found, SEARCHCL retums the 
result of F applied to the reference paired withl; otherwise 
SEARCHCL returns the result of G, which is a function 
of no arguments. (The noncoercing functions NCSET and 
NCREF are used to allow the values possessed by the 
components of V to be references.) 

Label Values 

The final type of value used in GEDANKEN is the 
label value. These values are created during execution of a 
block containing labeled statements, and are used as 
arguments to the built-in function GOTO, which never 
returns but instead causes a transfer of control to the 
computational state represented by the label value. 

A more precise description requires introducing a model 
of the Interpretation of GEDANKEN by an abstract 
machine. A complete description of such a model (given 
in [7]) is beyond the scope of this paper, but the following 
aspects are relevant to an understanding of the label and ' 
GOTO mechanisms: 

During the execution of a program (at any instant when 
a statement is about to be evaluated) the state of the 
abstract interpreter will include the following entities: 

(1) A control, which gives a list of the statements re- 
maining to be evaluated in the current block. 

(2) An environment, which gives the identifier Dindings 
to be used in the current block. 



(3) A dump, which specifies the computations to be 
performed after the current block is completed. The 
dump is a pushdown stack containing an entry for 
each block and lambda-expression body whose 
evaluation is incomplete; each entry contains a 
control and an environment (plus additional in- 
formation which is needed to describe partially 
evaluated Compound expressions). 

(4) A memory, which specifies the mapping of references 
into their values. 

A label value consists of a control, an environment, and 
a dump. During the evaluation of a block, immediately 
before the first statement is evaluated, a label value is 
created for each label in the block; each label value con- 
tains a list of the statements between the corresponding 
label and the block end, plus the current environment 
(including the bindings of the labels themselves) and dump. 

When the built-in function GOTO is applied to a label 
value, the current control, environment, and dump are 
replaced by the constituents of the label value, and execu- 
tion continues with the first statement of the new control. 
The memory is not altered. 

This mechanism permits jumps within the same block 
(which leave the environment and dump unchanged) or to 
higher level blocks, with the same effect as in Algol. But 
the f act that label values can be possessed by references or 
returned by functions also provides the ability to jump 
back into a block after it has been exited from. It is this 
capability which allows the construction of coroutines. 

Coroutines 

A coroutine is a procedure which can relinquish control 
to its calling program and later be reactivated to continue 
computation. The simplest Situation is that of two pro- 
cedures, each of which treats the other as a subroutine. 

As an example, suppose that COMPILE is a procedure 
which produces a succession of data items called Instruc- 
tions, outputting each Instruction by applying a function 
OUT, and that ASSEMBLE is a procedure which accepts 
a succession of Instructions, inputting each Instruction by 
applying a function IN. If OUT and IN are arguments to 
COMPILE and ASSEMBLE respectively, we have 

COMPILE ISR X OUT ( • ■ • OUT X • ■ • ); 
ASSEMBLE ISR X IN ( • • • X := IN( ) • • • ); 

We now want to couple these procedures so that 
ASSEMBLE receives the output of COMPILE. Speci- 
fically, we want to run ASSEMBLE until it requests input, 
than run COMPILE until it produces the required output, 
then run ASSEMBLE again, etc. The necessary program 
can be written by using label- valued references which are 
global to both IN and OUT: 

(LCISREFO; LA IS REF 0; INST IS REF 0; 

LC := LC1 ; ASSEMBLE (X ( ) (LA := LA1; GOTOLC; 

LA1: VAL INST)); GOTO DONE; 
LC1: COMPILE (XX (LC := LC2; INST := X; GOTO LA; 

LC2:)); GOTO ERROR; 
DONE:); 
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Here LA and LC are label-valued references saving the 
current states of ASSEMBLE and COMPILE, and INST 
is a third reference used to hold the Instruction being 
transmitted from COMPILE to ASSEMBLE. If COM- 
PILE finishes while ASSEMBLE is still waiting for another 
Instruction, an error stop occurs. 

Nondeterministic Algorithms 

Label values in GEDANKEN are closely related to 
"processes" in Simulation languages such as Simula [13a 
and 13b] ; both are mechanisms which allow the State of a 
suspended computation to be saved as an item of data. The 
essential difference is that further execution of a computa- 
tion which was saved as a process causes the process to be 
updated, while further execution of a computation saved 
as a label value leaves the label value unchanged. Thus 
label values can be used to repeatedly initiate execution 
from the same state. 

This capability can be used to program a mode of execu- 
tion for nondeterministic algorithms [14] in which alterna- 
tive paths are pursued concurrently. A simple example is 
nondeterministic parsing. It is fairly straightforward to 
convert a context-free grammar into a recursive parsing 
function. Unfortunately, for many grammars this function 
will contain nondeterministic branches, i.e. points at which 
a conditional branch must be performed although the cur- 
rent state of the parse is insumcient to determine this 
branch. 

When such nondeterminism exists, parsing can be ac- 
complished by simulating a finite set of independent 
parsers, all accepting the same input string and obeying 
the same program, but with different control states. When 
a parser encounters a nondeterministic branch, it expands 
into two separate parsers; when a parser reads an input 
character which is inconsistent with its control state, it is 
deleted. 

SpecificaUy, we assume that PARSE(IN, AMB, FAIL) 
is a function which accepts two functions IN and AMB, 
and a label value FAIL, and returns some representation 
of a successful parse. The function IN, of no arguments, is 
applied by PARSE to read each character of the input 
string. The function AMB, whose argument is a label 
value, is applied to execute a nondeterministic branch; one 
side of the branch returns from AMB while the other 
jumps to the label-valued argument. PARSE jumps to the 
label value FAIL when it encounters an inconsistent 
character. We assume that PARSE does not set any 
references, or at least that it does not expect the value of 
any reference to be preserved across an application of IN 
or AMB. 

The following program carries out the concurrent execu- 
tion of PARSE, synchronizing the independent parsers by 
their reading of characters: 

(C IS REF NIL; W IS REF NIL; R IS REF NIL; 
CHAR IS REF NIL; 
C := (PARSE(X( ) (W := (LI, VAL W) ; GOTO CONT; 
LI: VAL CHAR), 
X L2 (R := (L2, VAL R)), CONT), 
VAL C) ; 
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CONT : IF R = NIL AND W = NIL THEN GOTO DONE 
ELSE IF R = NIL 

THEN (CHAR := READCHAR( ); R :=W;W := NIL) 
ELSE (); 
(L IS R 1; R := R 2; GOTO L); 
DONE: VALC) 

Each independent parser is represented by a label value if 
it has not completed its parse, or by its result if it has 
completed its parse. The finite set of parsers is main- 
tained by the values of the references C, W, and R. C gives 
a list of the results of completed parses, W gives a list of 
label values representing the parsers which are waiting for 
the next character, and R gives a similar list for the parsers 
which are ready for execution before reading the next 
character. The reference CHAR keeps track of the current 
character, and is updated by the built-in function READ- 
CHAR. The label CONT is reached whenever execution is 
to be switched from one parser to another. The final value 
of the block is the list of completed parses; the input 
string is ill formed, well formed, or ambiguous depending 
upon whether this list has zero, one, or more than one 
element. 

(This approach to parsing is basically the same as that 
used in the Cogent programming System [15a and 15b]. 
It is presented here as an illustration of the generality of 
GEDANKEN, but it does not represent a significant 
advance in the field of parsing techniques. Although it is 
reasonably efncient for a large class of unambiguous gram- 
mars, at least if the function PARSE is carefully con- 
stmcted, some ambiguous grammars will cause an ex- 
ponential growth in the number of parsers and are better 
treated by other methods, such as that of Earley [16].) 

Limitation» and Possible Extensions 

The goal of applying the basic principles of Gedanken 
to the design of an efncient general purpose programming 
language raises several interesting research problems: 

(1) Addition of Type Declarations. The most natural 
approach is probably an extension of Hoare's concept of 
record classes [9]. The programmer would be able to 
declare an arbitrary number of disjoint function, reference, 
and label classes, and would specify the ränge of each 
identifier, function result, and reference value to be some 
union of such classes (and/or predefined classes of primi- 
tive values). All functions in the same class would have the 
same domain-range relation, and all references in the same 
class would have the same set of possible values. 

However, the functional approach to data structures will 
.require unusual flexibility in the specification of the do- 
main-range relations of functions. If an inhomogeneous 
data structure such as a record is to be treated as a func- 
tion, then it must be possible to specify that the ränge 
of such a function depends on its argument. For example, 
the set of lists of integers would be the union of the set 
{NIL} with a class of functions with domain (1, 2) which 
map 1 into an integer but map 2 into a list of integers. 

An elaboration of this approach to type, limited to a 
purely applicative language, is described in [17]. 

Communications of the ACM 317 



(2) Open Functions. Efficient implementation of func- 
tional data structures will require that certain functions be 
compiled into open code, i.e. that function designators 
should be replaced by modified copies of the correspoading 
lambda-expression body, and that these copies should then 
be simplified to take advantage of constant arguments. 
This capability could be provided by a macro-definitional 
facility. A second approach, more in keeping with the spirit 
of GEDANKEN, would be to permit certain lambda 
expressions to be given an OPEN attribute. 

This raises the question of whether a Compiler could 
determine automatically when a designator of a lambda- 
defined function should be replaced by a copy of the func- 
tion body. One might conjecture that such an expansion 
could be performed for any function which was defined by 
a nonrecursive declaration. Unfortunately, this conjecture 
is disproved by the existence of a nonrecursive fixed-point 
function: 

Y IS XG (U IS XV G(XX (V V) X) ; U U) ; 

which can be used to convert any simply recursive function 
(i.e. a function which calls itself directly but not indirectly 
via other functions) into an equivalent nonrecursive func- 
tion [18]. 

Thus suppose a recursive function F is defined by F 
ISR b, where F is the only identifier which oecurs free in b. 
Let Fl be the nonrecursive function defined by Fl IS XF 
(&). Then the function (Y Fl) can be shown to be equi- 
valent to F, with the same domain of termination. More- 
over, the expansion of a function designator such as (Y 
Fl) X by repeated Substitution of the definitions of Y and 
Fl will never terminate. 

(3) Storage Allocation. A serious drawback of the 
principle of completeness is the elimination of any run- 
time stack discipline, so that all data storage must be 
recovered by garbage collection. This problem might be 
alleviated by adding language facilities for indicating 
contexts where a stack discipline is applicable. Even with- 
out such facilities, it may be possible to determine by 
program analysis, particularly with appropriate type 
declarations, situations where storage can be recovered 
without garbage collection. 

(4) Side Effects. In the applicative subset of GEDAN- 
KEN, the immediate subexpressions of a function designa- 
tor or a sequence expression can be evaluated in any order, 
or the steps of their evaluation can be intermixed, without 
affecting the result or termination of any program. This 
property, which is obviously desirable for code optimiza- 
tion or multiprocessing, is destroyed by the introduction 
of assignment, since subexpressions can execute interfering 
side effects. 

The Situation is exacerbated by the introduction of 
label values, since then the order of evaluation can affect 
the number of times a subexpression is executed. The 
program 

(XISREFO; (X := INC X, GOTO L) ; L: VALX) 

produces one with left-to-right evaluation of the sequence 
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expression, but produces zero with right-to-left evaluation. 
Label-valued references lead to more paradoxical pro- 
grams, such as 

(XISREFO;LISREFO;MISREFO;L := LI; 
(X := INC X, (M := Ml; Ml: GOTO L)); 
LI: L := L2; GOTO M; L2: VAL X) 

which produces one with left-to-right evaluation, zero with 
right-to-left evaluation, and possibly two with intermixed 
evaluation. 

This problem is common to a wide variety of languages. 
One either imposes a fixed order of evaulation, as in 
Algol 60 or GEDANKEN, or permits a significant cl.-iss 
of well-formed programs to have indeterminate inter- 
pretations, as in Algol 68 or PL/I. But a more flexible 
approach might be possible, e.g. a limited form of impera- 
tive features which could be added to an applicative lan- 
guage without destroying order-of-evaluation indepen- 
dence. 

(5) Other Label-Value Problems. Label-valued refer- 
ences can easily cause the preservation of data which will 
no longer be accessed by a computation. If L is a label- 
valued reference, then GOTO L will cause execution to 
proceed from the computational State denoted by L. But 
the unchanged state must also be saved in case GOTO L is 
executed again before the value of L is changed. If, in fact, 
such a repeated jump cannot occur, then Information 
will be saved unnecessarily unless the programmer goes to 
the trouble of resetting L immediately after the original 
jump. (As an example, the program for linking the co- 
routines COMPILE and ASSEMBLE will preserve the 
states of these routines unnecessarily.) 

Presumably, it would be better to force the programmer 
to extra trouble in order to preserve, rather than discard, 
a reactivated computational state. This might be accom- 
plished by adapting the concept of "process" used in 
Simulation languages, and providing a basic function for 
copying processes. However, it is not clear how to combine 
the process concept with an ALGOL-like use of label values 
in a clean manner which does not violate the principle of 
completeness. 

A further difficulty is the inability of a label value to 
preserve the values of references (i.e. the memory). In the 
nondeterministic parser described earlier, the restriction 
on the use of references in the function PAESE arises from 
this problem. 

(6) Secondary Storage and File Management. Even with 
open functions and sophisticated code optimization, it 
may be intolerably inefficient to impose a purely functional 
approach on all data structures. But the functional ap- 
proach still holds considerable promise for the treatment of 
large structures which require secondary storage. A stated, 
but usually unmet goal of most data management Systems 
is the complete Separation of the logical properties of a file 
from its physical representation. A natural approach to 
this goal would be to equate a logical nie with a collection 
of functions for accessing the file, and to permit these 
functions to be implicit. 
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